String Matching Methodologies:A Comparative Analysis
نویسندگان
چکیده
String matching is the problem of finding all occurrences of a character pattern in a text. This paper provides an overview of different string matching algorithms and comparative study of these algorithms. In this paper, we have evaluated several algorithms, such as Naive string matching algorithm, Brute Force algorithm, Rabin-Karp algorithm, Boyer-Moore algorithm, KnuthMorris-Pratt algorithm, Aho-Corasick Algorithm and Commentz Walter algorithm. We analysed the core ideas of these single pattern string matching algorithms and multi-pattern string matching algorithms.We compared the matching efficiencies of these algorithms by searching speed, pre-processing time, matching time and the key ideas used in these algorithms. It is observed that performance of string matching algorithm is based on selection of algorithms used and also on network bandwidth. KeywordString matching, Naive Search, Rabin Karp, Boyer-Moore, KMP, Exact String Matching, Approximate String Matching, Comparison of String Matching Algorithms. I.INTRODUCTION String matching is a technique to find out pattern from given text. Let ∑ be an alphabet. Elements of ∑ are called symbols or characters. For example, if ∑ = {a, b}, then abab is a string over ∑. The pattern is denoted by P [1....m]. The text is denoted by T [1...n]. If P occurs with shift s in T, then we call s a valid shift; otherwise, we call s an invalid shift. The string matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T [1]. Figure 1 shows this definition [2]. Figure 1: String Matching Example II.EXACT STRING MATCHING ALGORITHMS Exact string matching is used in search of any occurrence of a string A in string B. These algorithms are applied in biology, and especially in the segment concerning DNA chains [5]. Much of data processing in bioinformatics involves in one way or another recognising certain patterns within DNA, RNA or protein sequences. A. Single pattern string matching algorithms 1) Naive string matching algorithm: It is also known as Brute Force algorithm. It has no pre-processing phase, needs constant extra space. It always shifts the window by exactly one position to the right. It requires 2n expected text characters comparisons. It finds all valid shifts using a loop that checks the condition P[1....m]=T[s+1........s+m] for each of the n-m+1 possible values of s . Consider the following example. T=ANPANMAN P=MAN ANPANMAN A brute force method for string matching algorithm is shown in Figure 2: Figure 2: Naive String Matching Example Naive string matching algorithm takes time O((nm+1)m), and this bound is tight in the worst case. The worst case running time is thus O((n-m+1)m)[4]. The running time of Naive String Matching algorithm is equal to its matching time, since there is no preprocessing. 2) Rabin Karp String Matching Algorithm: This algorithm uses hashing function. It works in two phases i.e. pre-processing phase (time complexity Θ(m)),matching phase(time complexity average Θ (n+m),worst Θ((n-m+1) m)).[4] Rabin Karp matcher is used to find a numeric pattern P from a given text T. It firstly divides the pattern with a predefined prime number q to calculate the remainder of pattern P. Then it takes the first m characters from text T at first shift s to compute remainder of m characters from text T. If the remainder of the pattern P and remainder of the text T are equal, only then we compare the text with pattern otherwise there is no need for comparison. We will repeat the process for next set of characters from text for all possible shifts which are from s=0 to n-m. So, according to this, two numbers n1 Akhtar Rasool Amrita Tiwari et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 3 (2) , 2012,3394 3397
منابع مشابه
Multiple Pattern String Matching Methodologies: A Comparative Analysis
String matching algorithms in software applications like virus scanners (anti-virus) or intrusion detection systems is stressed for improving data security over the internet. String-matching techniques are used for sequence analysis, gene finding, evolutionary biology studies and analysis of protein expression. Other fields such as Music Technology, Computational Linguistics, Artificial Intelli...
متن کاملProbabilistic Comparative String Analysis
Comparative string data has proven to be a valuable resource for improving the accuracy of computational methods for string analysis. In this report we describe the characteristics of comparative string data, focusing on biological sequences, and natural language text. We then describe a general probabilistic framework for analyzing pairs of strings, show how posterior based methods can be used...
متن کاملParallel String Matching with Multi Core Processors-A Comparative Study for Gene Sequences
The increase in huge amount of data is seen clearly in present days because of requirement for storing more information. To extract certain data from this large database is a very difficult task, including text processing, information retrieval, text mining, pattern recognition and DNA sequencing. So we need concurrent events and high performance computing models for extracting the data. This w...
متن کاملSignal Analysis on Strings for Immune-Type Pattern Recognition
We use wavelet-type discrete transforms for signal analysis on strings of finite length. We apply these transforms for edge and hidden Markov process detection. We also present new approaches for string matching and for measures of the diversity of chaotic strings.
متن کاملA Comparative Study of Wu Manber String Matching Algorithm and its Variations
String matching algorithms is become one of the most important topic in the computer science world. These algorithms are used in many real world problems like as scanning the threat in intrusion detection system, finding the pattern in text mining, match the similarity of the document in the plagiarism detection system, recognition in bio informatics and so on. String Matching Algorithms are br...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012